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ABSTRACT 


We  present  a  generalized  multivariate  seismic  event  identification  method,  Regularized  Discrimination  Analysis 
(RDA)  [Friedman  1989],  that  can  be  applied  to  a  large  number  of  regional  discriminants.  RDA  is  readily  adaptable 
to  an  outlier  or  classical  identification  approach  to  regional  seismic  identification.  RDA  is  designed  to  address  the 
problems  associated  with  linear  (LDA)  and  quadratic  (QDA)  discrimination  in  small-sample,  high-dimensional 
settings.  RDA  includes  LDA,  QDA  and  Euclidean  distance  based  nearest  neighbor  discrimination  in  its 
parameterization.  RDA  can  be  used  to  transition  from  an  outlier  analysis  approach  to  seismic  identification  to 
classical  discrimination  as  quality  explosion  calibration  data  are  collected.  Further,  RDA  provides  the  statistical 
structure  to  model  highly  correlated  seismic  measurements.  We  demonstrate  the  importance  of  including  the 
correlation  structure  between  seismic  measurements  in  event  identification.  Not  including  this  correlation  structure 
in  any  identification  framework  can  aggravate  identification  errors  and  give  an  erroneous  impression  of  capability. 
With  RDA,  a  large  number  of  amplitudes  from  a  Magnitude  and  Distance  Amplitude  Correction  (MDAC)  analysis 
[see  Taylor  et  al.  1999]  can  be  used  and  no  a  priori  sub-selection  of  amplitudes  (or  discriminants)  is  necessary. 
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OBJECTIVE 

Not  accounting  for  the  dependence  between  individual  seismic  discriminants  in  any  identification  process  can 
aggravate  identification  errors  and  give  an  erroneous  impression  of  capability.  For  example,  weight  is  positively 
correlated  with  height.  An  individual  who  is  6  feet  tall  and  weighs  120  lb.  might  reasonably  be  viewed  as  unusual. 
However,  taken  alone,  it  is  not  unusual  to  find  someone  who  is  6  feet  tall,  nor  is  it  unusual  for  someone  to  weigh 
120  lb.  It  is  the  inconsistency  with  the  correlated  behavior  of  height  and  weight  that  makes  a  6  foot,  120-lb.  person 
an  outlier. 

Two  discriminants  with  a  strong  positive  correlation  provide  redundant  information  about  the  source  of  a  seismic 
event  (strongly  correlated  discriminants  vary,  in  a  probabilistic  sense,  together).  For  example,  if  X  and  Y  are 
discriminants  and  the  correlation  between  X  and  Y  is  p ,  then  the  variance  of  X  +  Y  is 

Var(X  +  Y)  =  ax2  +aY2  +  2paxaY.  (1) 

The  Var(X  +  Y)  increases  linearly  in  p .  It  is  reasonable  to  conjecture  that  many  regional  discriminants  will  be 
positively  correlated.  Combining  correlated  discriminants  with  a  sum,  and  computing  the  variance  as  if  they  are 
uncorrelated,  may  not  be  a  good  aggregation  method.  For  a  measured  event  discriminant  X  =  x,  a  p  -value  can  be 
computed  as  the  conditional  probability  of  observing  a  discriminant  value  equal  to  or  more  extreme  than  x .  It  is 
important  to  note  that  a  p  -value  is  a  random  variable  because  it  is  a  function  of  a  random  variable.  For  observed 
discriminants  X  and  Y ,  we  can  compute  the  p  -values  px  and  p  and  then  aggregate  this  marginal  information 
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with  the  product  pAggregate  =  px  px  ■  However,  aggregating  with  a  product  can  also  be  dangerous.  For  example,  for 
bivariate  normal  random  variables  X,  Y  the  variance  of  the  product  XY  is 


Var(XY)  =  pY2ax 2  +  px2(JY2  +  2  ppxpYaxaY  +  (1  +  p2  )ax2aY2 .  (2) 

Here,  the  linear  correlation  between  X  and  Y  is  p .  Var(XY)  is  a  polynomial  in  p  with  no  real-valued  zeros 
(  Var(XY)  >  0 )  and  a  minimum  at  one  of  the  values  1 ,  - 1  or  -  pxpY  /a  xaY  .  Values  of  p  that  increase  Var(XY) 
are  governed  by  -  jixjiY  /a xaY  .  This  dependence  will  propagate  into  p -value  calculations  as  well. 

One  of  the  main  points  of  this  paper  is  that  seismic  discriminants  should  be  aggregated,  to  the  best  degree  possible, 
with  a  statistical  likelihood  or  probability  model.  A  likelihood-based  approach  to  combining  discriminants  provides 
a  rigorous  method  to  properly  account  for  correlation  between  discriminants.  The  most  desirable  event 
identification  framework  would  be  composed  of  discriminants  that  are  independent  of  each  other,  yet  strongly 
indicative  of  the  source  of  a  seismic  event.  Independent  discriminants  contribute  in  a  purely  additive  (orthogonal) 
way  to  the  identification  of  an  event,  and  never  carry  redundant  information.  For  example,  a  principal  components 
analysis  (PCA)  can  be  used  to  construct  linear  combinations  of  amplitudes  that  are  orthogonal.  In  the  traditional 
application  of  PCA,  a  subset  (dimension  reduction)  of  these  linear  combinations  is  used  to  construct  a  discrimination 
rule.  The  use  of  PCA  in  discrimination  analysis  has  no  limiting  statistical  deficiencies;  however,  we  feel  that  the 
PCA  approach  presents  some  seismological  concerns.  Any  feature  selection  analysis  on  amplitudes,  including  PCA, 
defacto  constructs  discriminants  that  may  or  may  not  have  a  known  physical  basis.  Thus,  the  primary  two  or  three 
PCA  linear  combinations  may  not  be  the  best  discriminants  for  a  Mahalanobis  distance  based  discrimination  rule 
(see  McLachlan,  1992,  page  197  for  a  statistical  basis  for  this  observation.). 

In  the  regional  setting,  a  PCA  will  likely  be  based  on  earthquake  data  with  no  explosion  data.  This  means  that  the 
PCA  linear  projections  may  do  a  poor  job  of  combining  explosion  amplitude  information,  because  LEX  =£  Y.EQ  (in 
other  words,  important  earthquake  versus  explosion  discriminants  may  not  be  included  in  the  final  PCA  linear 
amplitude  combinations).  If  all  the  PCA  linear  combinations  from  Y.Eq  are  used  in  an  outlier  analysis,  then  there 

will  be  no  loss  of  information;  however,  this  approach  is  conceptually  equivalent  to  the  regularized  discrimination 
analysis  (RDA)  we  present.  RDA  does  not  construct  potentially  controversial  linear  combinations  as  in  a  PCA.  In  a 
regional  setting,  the  goal  of  independent  discriminants,  without  a  PCA  type  analysis  is  probably  not  possible 
because  different  seismic  phases  may  share  similar  apparent  source  spectra  and  may  overlap  in  time  (e.g.,  Lg  spectra 
may  be  contaminated  by  Sn  coda).  Figure  1  illustrates  a  fabricated  model  for  two  discriminants  X  and  Y . 

Figure  la  gives  the  bivariate  ellipsoid  that  encloses  95%  of  the  data  from  a  particular  source  (the  gray  points).  The 
Gaussian  curves  on  the  top  and  right  sides  of  Figure  la  are  the  marginal  densities  for  this  model.  The  black  point  on 
the  graph  is  clearly  not  a  member  of  the  population  of  gray  points.  However,  neither  of  the  marginal  representations 
indicates  that  this  point  is  unusual.  Figures  lb  and  lc  show  a  transition  from  the  region  (black)  that  will  include  most 
all  of  the  gray  data  to  an  outlier  region.  The  circle,  superimposed  on  Figure  lb,  represents  a  source  elimination  rule 
that  is  constructed  by  assuming  no  correlation  between  X  and  Y .  The  ellipse,  superimposed  on  Figure  lc, 
represents  a  source  elimination  rule  that  is  constructed  with  the  inclusion  of  correlation  between  X  and  Y .  The 
most  disturbing  observation  is  the  potential  for  identification  errors  when  the  no-correlation  rule  is  used.  In  this 
case,  the  region  outside  of  a  decision  rule  defines  false  alarms  and  the  darker  region  interior  to  a  decision  rule 
defines  missed-explosions. 


X 


Figure  1.  Fabricated  model  for  discriminants  X  and  Y  and  correlation  and  no-correlation  based  decision 
rules. 


OUTLIER  DETECTION  ANALYSIS 

The  methodology  for  outlier  detection  in  the  seismic  context  is  well  established  (see  Fisk  et  al.  1996;  Taylor  and 
Hartse  1997).  Classical  multipopulation  discrimination  methodologies  may  not  be  well  suited  to  nuclear  test 
monitoring  for  two  main  reasons.  First,  existing  or  planned  seismic  stations  that  will  be  used  for  monitoring  have 
little  or  no  nuclear  explosion  data  on  which  to  adequately  characterize  the  statistical  distribution  of  the  nuclear 
explosion  population.  Note  that  industrial  mining  explosions  are  not  necessarily  a  good  surrogate  upon  which  to 
base  discriminants  for  nuclear  explosions.  Secondly,  even  if  a  set  of  nuclear  explosion  data  exists,  it  is  likely  to  be 
limited;  that  is,  a  small  number  of  events,  from  a  given  test  site,  detonated  under  standard  containment  conditions  (as 
opposed  to  potential  evasive  conditions).  Such  nuclear  explosion  data  may  not  be  suitable  for  deriving  population 
statistics  used  in  broad-area  monitoring.  Comparison  of  nuclear  explosion  data  from  different  test  sites  or  even  from 
within  a  single  test  site  (e.g.  Nevada  Test  Site)  illustrates  the  complexities  of  near-source  nonlinear  material 
properties  and  emplacement  conditions  on  seismic  discriminants  (see  Taylor  1991;  Taylor  and  Denny  1991). 

As  noted  in  Fisk  et  al.  (1996),  for  a  large  number  of  calibration  earthquakes,  the  likelihood  ratio  outlier  test  statistic 
is  essentially  the  multivariate  normal  density  function  (MVN(  (J.Eq  ,  EEq)).  Specifically,  for  a  vector  of 
discriminants  x ,  if 


/x(*)  = 


\2nlL 


EQ 


-1/2 


exp  • 


(x  M-eq)  £eq(*  M-eq)| 


(3) 


is  close  to  zero,  then  the  data  x  indicate  outlier,  otherwise  earthquake.  Here,  (j.Eq  and  Eeq  are  estimated  with  the 
calibration  data,  thus  establishing  the  outlier  rule  for  future  events.  The  term  close  to  zero  is  defined  by  a  critical 
value  £  that  serves  as  a  point  of  reference  for  evaluated  values  of  /x(x) ,  and  is  determined  by  the  tolerable  false- 
outlier  rate  a .  If  (j.EQ  and  Eeq  are  assumed  known  and  x  is  a  vector  of  p  variables  then 


P(/X(x)<£)  = 


a)  Discriminants  x1  and  a2  are  interior  to  a 
(1  -  a)%  probability  region  which  translates  to  a 
large  likelihood  value. 


b)  Discriminants  x1  and  a2  are  outlier  to  a 
(1  -  a)%  probability  region  which  translates  to  a 
likelihood  value  near  zero. 
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Figure  2.  Likelihood-based  outlier  rule.  The  rule  is  illustrated  in  two  dimensions  with  discriminants  xl 
and  a2  .  The  rule  is  easily  extended  to  p  dimensions  with  a  p  dimensional  likelihood  function. 

The  random  variable  (x  -  |tEQ)'EEQ  (x  -  (lEQ)  follows  a  chi-squared  distribution  with  p  degrees  of  freedom 
(Rencher  1998,  Theorem  2.2F),  thus 

P(/X(x)<£)  = 

1  -  Fxl  [-2  lntf  £  e  (o,1 


(3-b) 


where  F%2  (x)  is  the  chi-squared  cumulative  distribution  function  with  p  degrees  of  freedom.  To  determine  the 

value  of  £  we  solve  for  t,  in  the  equation  P(/X(x)  <  £)  =  1  -  FA  ^-2  ln(<^  J  |2  tt  52Eq  |  =  a .  Thus  an  outlier  rule 

based  on  /x(x)  is  equivalent  to  an  outlier  rule  based  on  (x  -  (tEQ)'EEQ  (x  -  (1Eq)  .  The  outlier  rule  /x(x) ,  in  two 
dimensions,  is  illustrated  in  Figure  2.  Note  that  this  rule  is  a  one-sided  test.  In  this  paper,  we  use  an  outlier  rule 
based  on  Equation  3  and  illustrated  in  Figure  2. 

RIDGE  DISCRIMINATION  AND  REGULARIZED  DISCRIMINANT  ANALYSIS 

The  optimal  regional  discrimination  method  needs  to  be  stable,  robust  and  simple,  and  it  should  have  a  well- 
grounded  physical  basis.  As  regional  seismic  research  continues,  an  optimal  regional  discrimination  method  will  be 
developed.  Ultimately,  that  method  may  use  only  two  or  three  features  from  a  seismic  wave.  What  is  currently 
desirable  is  a  technique  that  properly  aggregates  all  available  seismic  information  from  a  suite  of  phase 
measurements.  Classical  discrimination  does  this  because  it  is  essentially  based  on  the  formation  of  a  likelihood 
ratio,  and  statistical  likelihood  functions  can  properly  combine  phase  amplitudes. 

In  the  Gaussian  case,  the  likelihood  requires  a  covariance  matrix  of  the  phase  amplitudes,  and  as  discussed  above 
these  phase  amplitudes  may  be  strongly  correlated.  This  will  lead  to  a  near  singular  covariance  matrix  that  in  turn 
will  give  a  likelihood  function  with  unstable  statistical  properties.  There  has  been  some  very  useful  research  on  this 
problem  within  the  statistics  community  (see  Srnidt  and  McDonald  1976;  DiPillo  1976,  1977,  and  1979;  Randles  et 
al.  1978;  Loh  1995,1997;  and  Campbell  1980).  In  general  terms,  this  research  studied  the  utility  of  a  ridge 
adjustment  to  the  covariance  matrix  in  discrimination  analysis.  Aki  and  Richards  (1980)  describe  this  type  of 
adjustment  as  the  stochastic  inverse  in  seismological  inverse  problems.  This  adjustment  is  also  similar  to  damped 
least  squares  (Aki  and  Richards  1980).  In  the  statistical  literature  this  approach  is  known  as  ridge  discrimination.  In 
a  preliminary  study,  we  have  embedded  these  ideas  into  the  event  source  elimination  approach  to  regional 
discrimination.  Our  initial  studies  have  produced  some  positive  results. 

Ridge  Discrimination 

Ridge  discrimination  was  proposed  as  a  method  of  addressing  the  problem  of  near-singular  covariance  matrices  in 
Gaussian  linear  (LDA)  and  quadratic  (QDA)  discrimination.  In  ridge  discrimination,  the  covariance  matrix  of  the 
kth  group,  used  in  a  LDA  or  QDA  application,  is  an  additive  combination  of  the  sample  covariance  and  the  identity 
matrix.  The  weighting  in  this  addition  is  governed  by  a  smoothing  parameter  A .  A  common  A  is  used  across  all 
groups.  Formally,  the  ridge  discrimination  covariance  matrix  for  the  kth  group  is 

Zk(A)  =  (l-A)Sk+A^^I;Ae[0,l],  (4) 

P 

Here,  tr( Sk)  is  the  trace  of  the  sample  covariance  matrix  Sk  and  p  is  the  dimension  of  the  amplitude  vector.  The 
covariance  matrix  Ek(A)  is  essentially  formed  by  adding  a  A  proportion  of  the  average  eigenvalue  of  Sk  to  the 
diagonal  elements  of  Sk .  Equation  4  is  equivalent  to  Equation  12.132  in  Aki  and  Richards  (1980)  with  £  (Aki  and 
Richards)  and  A  playing  analogous  roles. 

As  a  preliminary  study  of  this  approach,  we  have  performed  a  leave-one-out  Monte  Carlo  outlier  analysis  with 
highly  correlated  regional  discriminant  data.  This  type  of  Monte  Carlo  study  can  also  be  used  to  select  optimal 
features  for  outlier  detection.  First,  we  fix  a  value  of  A .  With  n  earthquake  events,  a  leave-one-out  cross-validation 
involves  n  steps.  For  step  i,  the  ith  event  was  removed  from  the  earthquake  data.  This  event  was  used  as  the  test  case 
and  all  other  earthquake  data  were  used  to  construct  the  covariance  SEQand  the  mean  xEq.  We  then  construct  the 
covariance  EEq(A).  The  SEQand  xEgare  then  used  to  generate  a  large  number  of  simulated  discriminants,  and  for 
each  simulated  data  point  we  evaluate  the  multivariate  normal  (MVN)  density  using  EEQ(A)and  xEq  (i.e., 

MVN(  xEq,  EEq(A))).  We  need  to  do  this  because  we  use  the  MVN(  (o.Eq  ,  Eeq(A))  density  as  the  outlier  rule  and 
these  simulated  data  can  be  used  to  define  a  critical  value  t;  for  the  rule.  The  value  t;  serves  as  the  critical  value  to 
classify  an  event  as  earthquake  or  outlier.  In  our  study,  we  use  the  5th  percentile  (  a  =0.05)  of  the 
MVN(  xEq,  EEq(A))  density  values,  gotten  from  the  simulated  discriminants,  for  the  critical  value  ^ . 


If  the  MVN(  xEq,  EEq(A))  density  is  greater  than  | ,  when  evaluated  with  discriminants  from  an  unknown  event, 
then  we  would  conclude  the  event  is  an  earthquake.  If  the  MVN(  xEq,  £Eq(A))  density  is  smaller  then  £ ,  then  the 
data  are  in  the  extreme  regions  of  the  density  support  and  we  would  call  the  event  an  outlier.  We  evaluate  the  test 
case  with  this  rule.  Repeating  this  process  for  all  n  of  the  earthquake  data  gives  a  leave-one-out  cross-validated 
error  rate  for  the  fixed  A  value.  We  then  fix  another  value  of  A  and  repeat  the  cross-validation  analysis. 

The  data  used  in  this  preliminary  study  consist  of  amplitudes  of  Pn,  Pg,  Sn  and  Lg  taken  in  seven  different 
frequency  bands  and  corrected  for  source  and  propagation  effects  (see  Taylor  et  al.  1999).  The  data  consist  of  412 
earthquakes  and  4  explosions.  For  each  source,  the  data  are  normalized  to  the  low  frequency  Lg. 

Four  sets  of  seven  amplitudes  are  analyzed  in  this  paper.  The  seven  amplitudes  were  selected  from  the  twenty-seven 
amplitudes  (28-1=27,  low  frequency  Lg  was  used  to  normalize  the  amplitudes).  Additionally  this  data  set  has  four 
explosions  that  were  used  as  test  cases  in  all  of  the  steps  of  the  cross-validation  study.  In  our  study,  we  compute  a 
cross-validated  false-outlier  rate;  however,  we  cannot  reasonably  estimate  a  missed-explosion  rate  because  we  have 
only  the  four  explosions. 

We  have  noted  that  a  very  small  value  of  A  can  be  used  to  get  a  covariance  £Eq(A)  with  an  acceptable  condition 
number  (ratio  of  max  to  min  eigenvalues).  In  Figure  3,  we  summarize  the  distribution  of  the  test  values  /x(x)  with 
trade-off  plots.  The  ordinate  is  the  5th  percentile  of  /x(x)  evaluated  at  each  test  amplitude  data  point  and  the 
abscissa  is  the  interquartile  range  (IQR)  of  /x(x) .  Each  point  is  labeled  with  its  corresponding  A  value.  Because 
all  of  the  test  values  are  computed  from  earthquake  amplitudes,  we  want  a  distribution  of  test  values  that  will 
optimally  indicate  earthquake.  In  terms  of  the  trade-off  plots  in  Figure  3,  we  want  the  5th  percentile  of  the  outlier 
test  statistic,  /x(x),  to  be  as  large  as  possible.  Note  that  a  small  increase  in  the  value  of  A  from  zero  generally 
decreases  the  IQR  and  increases  the  5th  percentile  up  to  an  optimum.  These  are  desirable  distributional  properties 
because  these  two  features  indicate  that  the  distribution  is  shifting  away  from  zero  and  the  variability  of  the 
distribution  is  decreasing.  This  property  is  further  illustrated  with  fabricated  boxplots  in  Figure  4,  and  a  summary 
plot  with  regional  data  in  Figure  5.  Eventually  the  A  values  cause  the  5th  percentile  to  move  toward  zero  with  a 
continued  mild  decrease  in  the  IQR.  Again,  this  is  not  desirable  because  small  test  values  indicate  outlier.  The  main 
points  in  this  discussion  are  that  a  mild  increase  in  A  away  from  zero  will  give  a  covariance  Eeq(A)  with  an 

acceptable  condition  number,  and  that  there  is  an  optimal  value  of  A  that  gives  test  value  ( /x(x) )  distribution 
properties  that  minimize  false-outlier  rates.  As  shown  in  Figure  5,  for  an  optimal  A ,  the  ability  to  detect  the 
explosions  as  outliers  to  the  earthquake  population  is  excellent.  Thus  in  this  preliminary  study,  we  have  observed 
that  A  can  have  an  optimal  value  that  achieves,  in  the  mean  square  error  sense,  a  minimal  false-outlier  rate.  Other 
research  in  support  of  this  observation  can  be  found  in  Peck  et  al.  (1988).  These  methods  show  promise  as  a  way  to 
address  the  problem  of  high  correlation  among  seismic  discrimination  measurements.  Ridge  discrimination  is 
especially  appealing  when  it  is  generalized  to  regularized  discrimination  analysis. 
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Figure  3.  Trade-off  plots  of  the  test  value  data  (MVN(  jo.Eq  ,  EEq(A)))  evaluated  with  the  cross-validation 
earthquake  amplitudes.  The  value  of  A  appears  in  red,  and  the  number  (n)  of  cross-validation  data  used  to  construct 
the  plot  appears  at  the  bottom  of  pair  of  plots. 


Figure  4.  Illustration  of  the  distributional  properties  of  the  test  statistic  /x(x)  as  a  function  of  A 
An  increase  in  the  value  of  A  away  from  zero  generally  decreases  the  IQR  and  increases  the  5th 
percentile  up  to  an  optimum.  These  are  desirable  distributional  properties  because  these  two 
features  indicate  that  the  distribution  is  shifting  away  from  zero  and  the  variability  of  the 
distribution  is  decreasing.  Eventually  the  A  values  cause  the  5lh  percentile  to  move  toward  zero 
with  a  continued  mild  decrease  in  the  IQR.  The  shaded  region  represents  the  critical  region  that 
defines  outlier. 
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Figure  5.  Test  statistic  /x(x)  plotted  against  mb  for  Group  No.  Ill  in  Figure  3.  In  this  summary 
plot,  all  earthquake  events  were  used  to  estimate  |iEq  and  £Eq(A).  The  optimal  A  value  is  0.15. 

Based  on  these  data,  this  value  will  give  a  minimum  IQR  and  a  5th  percentile  that  is  optimally 
above  zero. 

Regularized  Discrimination  Analysis 

Regularized  discrimination  analysis  (RDA)  was  proposed  by  Friedman  (1989)  as  a  method  of  discrimination  to 
address  applications  with  highly  correlated  discriminants  and  small  training  samples  for  some  classification  groups. 
Friedman  s  generalization  of  ridge  discrimination  involves  the  construction  of  a  weighted-average  covariance  matrix 

Sk(y)  =  (1-y) Sk  +  y  S ;  y  e [0,1] .  (5) 

Here,  Sk  is  the  computed  covariance  matrix  for  kth  group,  and  S  is  the  pooled  covariance  matrix.  Note  that  Sk  may 
be  singular  due  to  a  small  number  of  training  data  or  strongly  correlated  variables  for  the  kth  group.  Sk(y  =  0)  is 
computed  from  the  kth  group  data  alone  (QDA)  and  Sk(y  =  1)  is  a  pooled  covariance  (LDA).  RDA  uses  a  two- 
parameter  formulation  of  a  covariance  matrix  in  forming  discrimination  rules.  With  Sk(y)  defined  above,  the  RDA 
covariance  matrix  is 


£k(A,  7)  =  (1  -  A)  Sk(y)  +  A  MSk(y))  I;  A  e  [0,1],  ye  [0, 1] .  (6) 

P 

Note  that  this  is  simply  the  ridge  discrimination  formulation  with  Sk  replaced  by  Sk(y).  Here,  A  and  y  are  the 
same  values  across  all  groups.  Higbee  s  (1994)  generalization  allows  A  and  y  to  change  from  group  to  group. 

RDA  theory  can  potentially  be  integrated  into  an  outlier  analysis  approach  to  event  identification  or  used  as  a 
classical  seismic  discrimination  method.  For  outlier  analysis,  pooling  ( S)  could  potentially  occur  across  seismic 
stations  within  a  geophysically  homogeneous  region.  Here,  each  station  may  have  observed  a  small  number  of 
seismic  events.  An  RDA  type  covariance  would  be  constructed  for  each  station  (Equation  6).  We  are  researching 
techniques  of  optimally  choosing  values  for  A  and  y  for  £k(A,  y)  when  only  earthquake  data  are  available. 

There  are  some  very  appealing  features  of  RDA.  As  noted  in  Friedman  (1989),  RDA  reduces  to  quadratic 
discrimination  for  values  of  A  =  0 ,  and  y  =  0 .  ForA  =  0,  and  y  =  1  RDA  reduces  to  linear  discrimination.  Other 
extremes  in  the  RDA  parameters  give  nearest  neighbor  and  weighted  nearest  neighbor  type  discrimination  methods. 
In  the  nearest  neighbor  case,  A  =  1 ,  and  y  =  1  which  gives  £k(l,  1)  =  tr(S)/p  I  =  (0  I,  (o  a  constant.  For  the 
weighted  nearest  neighbor  case,  A  =  1,  and  y  =  0  which  gives  £k(l,0)  =  tr(Sk)/p  I  =  (Ok  I,  (Ok  a  constant.  In  both 
of  these  cases,  the  discrimination  function  is  based  on  Euclidean  distance  rather  than  Mahalanobis  distance. 

However  in  the  weighted  nearest  neighbor  case  the  kth  term  in  the  discrimination  function,  corresponding  to  the  k,h 
group,  is  weighted  with  cok .  These  observations  are  summarized  in  Figure  6.  RDA  also  addresses  some  of  the 
inadequacies  with  QDA.  In  particular,  QDA  usually  requires  larger  sample  sizes  than  LDA  and  is  quite  sensitive  to 
model  violations  (Friedman  1989). 

RDA  provides  a  rich  and  adaptable  family  of  discrimination  methods  that  appear  to  be  very  applicable  to  the 
regional  seismic  problem.  RDA  readily  provides  a  statistical  framework  to  use  MDAC  amplitudes  (see  Taylor  et  al. 


1999)  in  regional  seismic  discrimination.  We  also  note  that  the  Ek(A,  y)  can  be  used  as  the  covariance  matrix  in 
negative  evidence  methods  (Anderson  et  al.  1999).  In  fact,  RDA  can  be  the  foundation  for  any  enhanced 
discrimination  framework  that  is  based  on  the  use  of  statistical  likelihood  functions.  For  the  classical  discrimination 
problem  (calibration  data  for  all  seismic  sources),  optimal  values  of  A ,  and  7  are  identified  with  cross-validated 
error  rates.  The  details  of  this  procedure  can  be  found  in  Friedman  (1989). 
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Figure  6.  The  relationship  between  regularized  discrimination  analysis  (RDA)  and  some  other 
classical  discrimination  methods. 

CONCLUSIONS  AND  FURTHER  DEVELOPMENTS 

We  have  illustrated  the  importance  of  properly  modeling  the  correlation  structure  of  seismic  measurements  in  the 
seismic  event  identification  problem.  If  this  correlation  is  not  captured  in  the  mathematics  of  a  discrimination 
method,  then  both  false-alarm  and  missed-explosion  rates  can  be  aggravated.  Empirically  we  have  noted  that  the 
false-alarm  errors  can  be  seriously  increased  when  discriminant  correlations  are  not  properly  modeled.  In  the 
regional  discrimination  problem,  seismic  measurements  will  be  strongly  correlated  (e.g.,  Lg  spectra  may  be 
contaminated  by  Sn  coda  and  amplitudes  may  be  constructed  with  frequency  bands  that  overlap).  This  poses 
another  problem  in  that  an  estimate  of  a  covariance  matrix  for  these  measurements  will  be  near  singular.  The  true, 
unknown  covariance  matrix  is  itself  near  singular.  Ridge  discrimination  and  its  generalization,  RDA,  provide  a 
statistical  method  that  can  perform  well  in  the  presence  of  highly  correlated  seismic  measurements.  These  methods 
properly  combine  measurements  through  a  covariance  matrix  and  are  mathematically  adaptable  to  a  variety  of 
regional  seismic  identification  settings.  We  believe  that  regularized  discrimination  provides  a  reasonable  solution  to 
the  regional  discrimination  problem  and  provides  the  flexibility  to  adapt  to  a  future  maturation  of  seismic  event 
identification. 

If  an  outlier  detection  algorithm  is  based  on  the  use  of  statistical  likelihood  functions,  then  in  an  operational  setting, 
missing  data  may  be  accounted  for  using  detection  thresholds  with  negative  evidence  methods  (Anderson  et  al. 

1999).  However,  outlier  detectors  are  constructed  with  ground  truth  earthquake  data,  which  can  have  left-censored 
data  due  to  poor  signal-to-noise,  particularly  at  high  frequencies.  Woodward  et.al  (1999)  show  the  utility  of  filling 
missing  ground  truth  earthquake  data  with  the  EM  algorithm  with  a  marked  decrease  in  error  rates.  Anderson  and 
Phillips  (1999)  have  developed  an  approach  to  incorporate  censoring  thresholds  associated  with  missing  spatial  data 
into  Kriging  parameter  estimates.  We  are  adapting  the  work  of  Woodward  et.al  (1999)  and  Anderson  and  Phillips 
(1999)  to  the  outlier  detector  approach  presented  in  this  paper.  In  particular,  future  developments  include; 

•  the  development  of  methods  to  incorporate  amplitude  censoring  thresholds  into  the  EM  algorithm  to  fill  missing 
earthquake  training  data, 

•  the  development  of  a  method  of  optimally  choosing  the  RDA  parameters  A  and  7  using  only  earthquake 
training  data  (RDA  concepts  integrated  into  outlier  analysis), 

•  the  comparison  of  optimal  A  and  7  determined  with  explosion  and  earthquake  training  data  with  optimal  A 
and  7  determined  only  with  earthquake  training  data. 
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